Fast Average-Case Pattern Matching on Weighted Sequences
نویسندگان
چکیده
A weighted string over an alphabet of size σ is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain sequences, naturally arise in many contexts. In this article, we study the problem of weighted string matching with a special focus on average-case analysis. Given a weighted pattern string x of length m, a text string y of length n > m, and a cumulative weight threshold 1/z, defined as the minimal probability of occurrence of factors in a weighted string, we present an algorithm requiring average-case search time o(n) for pattern matching for weight ratio z m < min{ 1 log z , logσ log z(logm+log log σ) }. For a pattern string x of length m, a weighted text string y of length n > m, and a cumulative weight threshold 1/z, we present an algorithm requiring average-case search time o(σn) for the same weight ratio. The importance of these results lies on the fact that these algorithms work in average-case sublinear search time in the size of the text, and in linear preprocessing time and space in the size of the pattern, for these ratios.
منابع مشابه
I-45: Advance MRI Sequences in Pelvic Endometriosis
Background: To assess MRI in diagnosing endometriotic lesions, emphasizing T2*weighted imaging efficacy. Materials and Methods: This prospective study of 48 females (22-38 years, average 29.6) clinically suspected of endometriosis from September 2009 to April 2012. MRI was performed with a 1.5 T imager (Siemens) with a body array coil. T1, T2 and T2* weighted (2D-FLASH) sequences were obtained ...
متن کاملTwo simple heuristics for the pattern matching on weighted sequences
Weighted sequences are used as profiles for protein families, in the representation of binding sites, and sequences produced by a DNA shotgun sequencing assembly. In this paper we present two simple heuristics for the pattern matching on weighted sequences. One is a simple heuristic which enables a faster validation between a weighted candidate and a weighted text. The other is applying the bad...
متن کاملA Fast Generic Sequence Matching Algorithm
A string matching—andmore generally, sequence matching—algorithm is presented that has a linear worst-case computing time bound, a low worst-case bound on the number of comparisons (2n), and sublinear average-case behavior that is better than that of the fastest versions of the Boyer-Moore algorithm. The algorithm retains its efficiency advantages in a wide variety of sequence matching problems...
متن کاملPattern Matching on Weighted Sequences
Weighted sequences are used extensively as profiles for protein families, in the representation of binding sites and often for the representation of sequences produced by a shotgun sequencing strategy. We present various fundamental pattern matching problems on weighted sequences and their respective algorithms. In addition, we define two matching probabilistic measures and we give algorithms f...
متن کاملOn the Average-case Complexity of Pattern Matching with Wildcards
In this paper we present a number of fast average-case algorithms for pattern matching with wildcards. We consider the problems where wildcards are restricted to either the pattern or the text, however, the results can be easily adapted to the case where wildcards are allowed in both. We analyse the algorithms average-case complexity and their expected-case complexity and show new lower bounds ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1512.01085 شماره
صفحات -
تاریخ انتشار 2015